{"nbformat":4,"nbformat_minor":0,"metadata":{"anaconda-cloud":{},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"Tutorial IV.ipynb","provenance":[],"collapsed_sections":["l7s9QvYetmcY","BGrGmp-5tmcV","uwVORI1Vtmca","zljWXlXquHgp","UPrth-Ertmck","N_DKz0sKtmco","FPzKLLPitmcr"],"toc_visible":true}},"cells":[{"cell_type":"markdown","metadata":{"id":"1oZByfpftmcT","colab_type":"text"},"source":["# Tutorial IV: Convolutions"]},{"cell_type":"markdown","metadata":{"id":"N__G0gDAtmcU","colab_type":"text"},"source":["<p>\n","Bern Winter School on Machine Learning, 27-31 January 2020<br>\n","Prepared by Mykhailo Vladymyrov.\n","</p>\n","\n","This work is licensed under a <a href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>."]},{"cell_type":"markdown","metadata":{"id":"knW3JBEDtmcU","colab_type":"text"},"source":["In this session we will look at the convolutoin operation and try to build some intuition about it.\n","Also we will look at one of the state-of-the art deep models, [Inception](https://arxiv.org/abs/1602.07261). It is designed to perform image recognition."]},{"cell_type":"markdown","metadata":{"id":"l7s9QvYetmcY","colab_type":"text"},"source":["## 1. Load necessary libraries"]},{"cell_type":"code","metadata":{"id":"3G6x1ENecsyd","colab_type":"code","colab":{}},"source":["# if using google colab\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"cjsvvAJatmcY","colab_type":"code","colab":{}},"source":["import sys\n","import os\n","\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import IPython.display as ipyd\n","import tensorflow.compat.v1 as tf\n","tf.disable_v2_behavior()\n","from PIL import Image\n","\n","# We'll tell matplotlib to inline any drawn figures like so:\n","%matplotlib inline\n","plt.style.use('ggplot')\n","\n","\n","from IPython.core.display import HTML\n","HTML(\"\"\"<style> .rendered_html code { \n","    padding: 2px 5px;\n","    color: #0000aa;\n","    background-color: #cccccc;\n","} </style>\"\"\")"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"BGrGmp-5tmcV","colab_type":"text"},"source":["### Download libraries"]},{"cell_type":"code","metadata":{"id":"nL1BzlxC5PWy","colab_type":"code","colab":{}},"source":["p = tf.keras.utils.get_file('./material.tgz', 'https://scits-training.unibe.ch/data/tut_files/t4.tgz')\n","!mv {p} .\n","!tar -xvzf material.tgz > /dev/null  2>&1"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"vOpPZ2Hf5aB8","colab_type":"code","colab":{}},"source":["from utils import gr_disp\n","from utils import inception"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uwVORI1Vtmca","colab_type":"text"},"source":["## 2. Convolutions"]},{"cell_type":"markdown","metadata":{"id":"o0i6LmfYtmcb","colab_type":"text"},"source":["In fully connected network all inputs of a layer are connected to all neurons of the following layer:\n","<tr>\n","    <td> <img src=\"https://scits-training.unibe.ch/data/figures/Perceptron.png\" alt=\"drawing\" width=\"30%\"/></td> \n","    <td> <img src=\"https://scits-training.unibe.ch/data/figures/MLP.png\" alt=\"drawing\" width=\"50%\"/></td> \n","</tr> \n","<br>In convolutional nets the same holds for each neighbourhood, and the weights are shared:<br>\n","<img src=\"https://scits-training.unibe.ch/data/figures/CNN1.png\" alt=\"drawing\" width=\"50%\"/><br>\n","<img src=\"https://scits-training.unibe.ch/data/figures/CNN2.png\" alt=\"drawing\" width=\"50%\"/><br>\n","<img src=\"https://scits-training.unibe.ch/data/figures/CNN3.png\" alt=\"drawing\" width=\"50%\"/><br>\n"]},{"cell_type":"markdown","metadata":{"id":"D_nODHgbtmcb","colab_type":"text"},"source":["Let's see what a convolution is, and how it behaves."]},{"cell_type":"code","metadata":{"id":"dH2IPjiftmcc","colab_type":"code","colab":{}},"source":["#load image, convert to gray-scale and normalize\n","img_raw = plt.imread('ML3/chelsea.jpg').mean(axis=2)[-256:, 100:356].astype(np.float32)\n","img_raw = (img_raw-img_raw.mean())/img_raw.std()\n","\n","plt.imshow(img_raw, cmap='gray')\n","plt.grid(False)\n","img_raw4d = img_raw[np.newaxis,...,np.newaxis]"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Zi3rfvEitmce","colab_type":"code","colab":{}},"source":["g = tf.Graph()\n","with g.as_default():\n","    #convolve x 5 times with a 5x5 filter\n","    x = tf.placeholder(dtype=tf.float32, shape=(1,256,256,1),name='img')\n","    flt = tf.placeholder(dtype=tf.float32, shape=(5,5,1,1), name='flt')\n","    y1 = tf.nn.conv2d(x , flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved') #[1,2,2,1]\n","    y2 = tf.nn.conv2d(y1, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n","    y3 = tf.nn.conv2d(y2, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n","    y4 = tf.nn.conv2d(y3, flt, strides=[1,1,1,1], dilations=[1, 1, 1, 1], padding='VALID', name='convolved')\n","    "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"DOEB24sLtmcg","colab_type":"code","colab":{}},"source":["flt_mtx = [\n","    [ 0, 0, 0, 0, 0,],\n","    [ 0, 0, 0, 0, 0,],\n","    [ 0, 0, 1, 0, 0,],\n","    [ 0, 0, 0, 0, 0,],\n","    [ 0, 0, 0, 0, 0,],\n","]\n","\n","with tf.Session(graph=g) as sess:\n","    flt_mtx_np = np.array(flt_mtx, np.float32)\n","    flt_mtx_np = flt_mtx_np[..., np.newaxis, np.newaxis]\n","    res = sess.run([x,y1,y2,y3,y4], feed_dict={x:img_raw4d, flt:flt_mtx_np})\n","res = [r[0,...,0] for r in res]\n","\n","\n","n = len(res)\n","fig, ax = plt.subplots(1, n+1, figsize=(n*4, 4))\n","for col in range(n):\n","    ax[col].imshow(res[col], cmap='gray')\n","    ax[col].grid(False)\n","    ax[col].set_title('conv %d'% col if col else 'raw')\n","\n","ax[n].imshow(flt_mtx, cmap='gray')\n","ax[n].grid(False)\n","_=ax[n].set_title('filter')"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"Van15HojduYJ","colab_type":"code","colab":{}},"source":["def gaussian(n=5):\n","  x = np.linspace(-3, 3, n)\n","  y = np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n","  return y\n","\n","def dgaussian(n=5):\n","  x = np.linspace(-3, 3, n)\n","  y = - 2 * x * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n","  return y\n","\n","def ddgaussian(n=5):\n","  x = np.linspace(-3, 3, n)\n","  y = - 2 * (2*x**2 - 1) * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi)\n","  return y\n","  \n","def ddgaussian2d(n=5):\n","  c = np.linspace(-3, 3, n)\n","  r = np.asarray([[np.sqrt(xi**2+yi**2) for xi in c] for yi in c])\n","  f = lambda x: (- 2 * (2*x**2 - 1) * np.exp(-x**2 * 0.5) / np.sqrt(2*np.pi))\n","\n","  y = f(r)\n","  y -= y.mean()\n","  return y\n","  "],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"OljkMi3jgkh1","colab_type":"code","colab":{}},"source":["n = 30\n","gf = np.tile(gaussian(n)[np.newaxis], [n, 1])\n","\n","dgf = np.tile(dgaussian(n)[np.newaxis], [n, 1])\n","\n","ddgf = ddgaussian(n)\n","ddgf -= ddgf.mean()\n","ddgf = np.tile(ddgf[np.newaxis], [n, 1])\n","\n","ddgf2d = ddgaussian2d(n)\n","rf2d = lambda:  np.random.normal(size=(5,5))\n","\n","\n","plt.plot(gf[0])\n","plt.plot(dgf[0])\n","plt.plot(ddgf[0])\n"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"-wlTUAf60RXF","colab_type":"code","colab":{}},"source":["plt.imshow(dgf*gf.transpose())\n","plt.grid(False)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"zljWXlXquHgp","colab_type":"text"},"source":["## 3. Homework"]},{"cell_type":"markdown","metadata":{"id":"bZBfo0m40vB6","colab_type":"text"},"source":["In last session we used fully connected network to clasify digits.\n","Try to build the convolutional network: use three convolutional layers, then flatten the ouput and apply 1 fully connected.\n","You can use the following helper function. Notice: there is a stride parameter. It allows to effectively downscale the feature maps.\n","To get an understanding of different convolution types, check the <a href=\"https://github.com/vdumoulin/conv_arithmetic\">animations here</a>."]},{"cell_type":"code","metadata":{"code_folding":[0],"id":"5BoKD4jltmci","colab_type":"code","colab":{}},"source":["def conv_2D(x, n_output_ch,\n","            k=3,\n","            s=1,\n","            activation=tf.nn.relu,\n","            padding='VALID', name='conv2d', reuse=None\n","           ):\n","    \"\"\"\n","    Helper for creating a 2d convolution operation.\n","\n","    Args:\n","        x (tf.Tensor): Input tensor to convolve.\n","        n_output_ch (int): Number of filters.\n","        k (int): Kernel width and height\n","        s (int): Stride in x and y\n","        activation (tf.Function): activation function to apply to the convolved data\n","        padding (str): Padding type: 'SAME' or 'VALID'\n","        name (str): Variable scope\n","        reuse (tf.Flag): Flag whether to use existing variable. Can be False(None), True, or tf.AUTO_REUSE\n","\n","    Returns:\n","        op (tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor): Output of activation, convolution, weights, bias\n","    \"\"\"\n","    with tf.variable_scope(name or 'conv2d', reuse=reuse):\n","        w = tf.get_variable(name='W',\n","                            shape=[k, k, x.get_shape()[-1], n_output_ch],\n","                            initializer=tf.initializers.he_uniform()\n","                           )\n","        \n","        wx = tf.nn.conv2d(name='conv',\n","                          input=x, filter=w,\n","                          strides=[1, s, s, 1],\n","                          padding=padding\n","                         )\n","        \n","        b = tf.get_variable(name='b',\n","                            shape=[n_output_ch], initializer=tf.initializers.constant(value=0.0)\n","                           )\n","        h = tf.nn.bias_add(name='h',\n","                           value=wx,\n","                           bias=b\n","                          )\n","\n","        if activation is not None:\n","            x = activation(h, name=activation.__name__)\n","        else:\n","            x = h\n","    \n","    return x, w"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"OM-OC3L4wyTn","colab_type":"text"},"source":["You can start with something like this:\n"]},{"cell_type":"code","metadata":{"id":"zl3n54VPw-C5","colab_type":"code","colab":{}},"source":["...\n","x_train = x_train_2d[..., np.newaxis]  # we need additional channel dimension\n","\n","....\n","\n","X = tf.placeholder(name='X', dtype=tf.float32, shape=[None, n_input])\n","\n","L1, W1 = conv_2D(X, 16, name = 'C1')\n","L2, W2 = conv_2D(L1, 32, s=2, name = 'C2')\n","L3, W3 = conv_2D(L2, 32, s=2, name = 'C3')\n","\n","L3_f = tf.keras.layers.Flatten(L3)\n","\n","L4, W4 = fully_connected_layer(L3_f , 32, 'F1', activation=tf.nn.relu)\n","L5, W5 = fully_connected_layer(L4 , 10, 'F2')\n","\n","Y_onehot = tf.nn.softmax(L5, name='Logits')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"oEnrC5c0z-Ci","colab_type":"text"},"source":["Play with layer parameters. Can you get better performance than in fully connected network?"]},{"cell_type":"markdown","metadata":{"id":"UPrth-Ertmck","colab_type":"text"},"source":["## 4. Load the model"]},{"cell_type":"markdown","metadata":{"id":"U5v3hUe4tmck","colab_type":"text"},"source":["inception module here is a small module that performs loading the inception model as well as image preparation for the training."]},{"cell_type":"code","metadata":{"id":"q5uQgNCwtmcl","colab_type":"code","colab":{}},"source":["net, net_labels = inception.get_inception_model()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"pDfUcRvRtmcn","colab_type":"code","colab":{}},"source":["#get model graph definition and change it to use GPU\n","gd = net\n","\n","str_dg = gd.SerializeToString()\n","#uncomment next line to use GPU acceleration\n","#str_dg = str_dg.replace(b'/cpu:0', b'/gpu:0') #a bit extreme approach, but works =)\n","gd = gd.FromString(str_dg)\n","\n","gr_disp.show(gd)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"N_DKz0sKtmco","colab_type":"text"},"source":["## 5. Create the graph"]},{"cell_type":"markdown","metadata":{"id":"4klBSAR9tmcp","colab_type":"text"},"source":["This whole model won't fit in GPU memory. We will take only the part from input to the main output and copy it to a second graph, that we will use further."]},{"cell_type":"code","metadata":{"id":"W5O-bSOhtmcp","colab_type":"code","colab":{}},"source":["gd2 = tf.graph_util.extract_sub_graph(gd, ['output'])\n","g2 = tf.Graph() # full graph\n","with g2.as_default():\n","    tf.import_graph_def(gd2, name='inception')\n","\n","gr_disp.show(g2.as_graph_def())"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"FPzKLLPitmcr","colab_type":"text"},"source":["## 6. Test the model"]},{"cell_type":"markdown","metadata":{"id":"QJIfPj7Wtmcs","colab_type":"text"},"source":["We will use one image to check model. `img_preproc` is croped to 256x256 pixels and slightly transformed to be used as imput for the model using `inception.prepare_training_img`. `inception.training_img_to_display` is then used to convert it to displayable one.\n"]},{"cell_type":"code","metadata":{"id":"k9HRNGmstmcs","colab_type":"code","colab":{}},"source":["img_raw = plt.imread('ML3/chelsea.jpg')\n","img_preproc = inception.prepare_training_img(img_raw)\n","img_deproc = inception.training_img_to_display(img_preproc)\n","_, axs = plt.subplots(1, 2, figsize=(10,5))\n","axs[0].imshow(img_raw)\n","axs[0].grid(False)\n","axs[1].imshow(img_deproc)\n","axs[1].grid(False)\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"C7fpmx0btmcu","colab_type":"text"},"source":["We then get the input and output tensors, and obtain probabilities of each class on this image:"]},{"cell_type":"code","metadata":{"id":"nLrIdxsAtmcu","colab_type":"code","colab":{}},"source":["# From graph we will get the input and output tensors. \n","# Any tensor and operation can be obtained by name\n","g2.device('/gpu:0')\n","with g2.as_default():\n","    x = g2.get_tensor_by_name('inception/input:0')\n","    softmax = g2.get_tensor_by_name('inception/output:0')\n","    \n","# Then we will feed the image in the graph and print 5 classes that have highest probability\n","with tf.Session(graph=g2) as sess:\n","    res = np.squeeze(sess.run(softmax, feed_dict={x: img_preproc[np.newaxis]}))\n","    \n","    indexes_sorted_by_probability = res.argsort()[::-1]\n","    print([(res[idx], net_labels[idx])\n","           for idx in indexes_sorted_by_probability[:5]])"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"55kW3cf7trfs","colab_type":"code","colab":{}},"source":[""],"execution_count":0,"outputs":[]}]}